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Abstract 

For ill-posed inverse problems, a regularised solution can be inter- 
preted as a mode of the posterior distribution in a Bayesian framework. 
This framework enriches the set the solutions, as other posterior es- 
timates can be used as a solution to the inverse problem, such as the 
posterior mean. Bayesian formulation of an ill-posed inverse prob- 
lem is also natural for scientists as it uses a priori information in a 
rigourous probabilistic framework, and the posterior distribution can 
be viewed as a set of possible solutions to the considered ill-posed in- 
verse problem, with a weight characterising how well it is supported 
by the data and the prior information. 

In this paper we study properties of Bayesian solutions to ill-posed 
inverse problems, namely consistency and the rate of convergence in 
the Ky Fan metric. We consider the cases where the error distribution 
is not necessarily Gaussian, but belongs to a particular type of mod- 
els we refer to as Generalised Linear Inverse Problems. This setting 
includes some models where the response depends on the unknown 
parameter nonlinearly. We also consider a particular case of the un- 
known parameter being on the boundary of the parameter set, and 
show that the rate of convergence in this case is faster than in the 
case the unknown parameter is an interior point. 

Some key words: Ky Fan metric, consistency, rates of convergence, inverse prob- 
lems, Bayesian inference, nonregular likelihood, boundary, constrained ill-posed 
inverse problem. 
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1 Introduction 



1.1 Ill-posed problems and regularisation 

Inverse problems encountered in nature are commonly ill-posed: their so- 
lutions fail to satisfy at least one of the three desiderata of existing, being 
unique, and being stable. Thus, in the case of linear inverse problems, the 
focus is not on a unique solution x of 

y = Ax, (l) 

for given matrix A and data vector y, but rather on the corresponding space 
of solutions. 

Even when the solution x to (0Q) exists and is unique for each possible y, 
lack of stability means that the solution can be extremely sensitive to small 
errors, either in the observed y or in numerical computations for solving the 
equations. This has obvious deleterious consequences for the practical value 
of solutions. To circumvent this, the inverse problem is typically regularised, 
that is, re-formulated to include additional criteria, such as smoothness of 
the solution: 

x = argmin^^penfz), 

where pen(x) is a suitable scalar penalty function. 
If the data is observed with error 

y = Ax + error, 

then, allowing for the possibility of lack of existence or uniqueness, we might 
replace the natural least-squares formulation 

x = argmin||?/ — Ac|| 2 

of the inverse problem by 

x = argmin||?/ — Ax\\ 2 + i/pen(x) (2) 

where v a positive constant determining the trade-off between accuracy and 
smoothness. For further details, see ?. 

Such solutions make sense, and are commonly used, whether we regard 
the error in the data used as deterministic or stochastic in nature. The least- 
squares set up is rather natural, but from a statistical perspective corresponds 
to a Gaussian likelihood, and, as we shall see below, this may be replaced 
by certain other distributions, in most cases without material change to the 
subsequent analysis. 
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1.2 Inverse problems from a Bayesian perspective 

Smoothness, or other 'regular' behaviour of the solution to an inverse prob- 
lem, is a prior assumption on the unknown x, information about the model 
parameters known or assumed before the data are observed. To use such 
information is thus to accept that the required solution must combine data 
with prior information. In a statistical context the best-established principle 
for doing this is the Bayesian paradigm, in which all sources of variation, 
uncertainty and error are quantified using probability. 

From this perspective, the solution to ([2]) is immediately recognisable 
- it is the maximum a posteriori (MAP) estimate of x, the mode of its 
posterior distribution in a Bayesian model in which the data y are modelled 
with a Gaussian distribution with expectation Ax, with constant-variance 
uncorrelated errors, and in which the prior distribution of x has negative 
log-density proportional to pen(x). 

However, the Bayesian perspective brings more than merely a different 
characterisation of a familiar numerical solution. Formulating a statistical 
inverse problem as one of inference in a Bayesian model has great appeal, 
notably for what this brings in terms of coherence, the interpretability of reg- 
ularisation penalties, the integration of all uncertainties, and the principled 
way in which the set-up can be elaborated to encompass broader features 
of the context, such as measurement error, indirect observation, etc. The 
Bayesian formulation comes close to the way that most scientists intuitively 
regard the inferential task, and in principle allows the free use of subject 
knowledge in probabilistic model building (e.g. ?; ?; ?; ?; ?). For an in- 
teresting philosophical view on inverse problems, falsification, and the role 
of Bayesian argument, see ?. Various Bayesian methods to solve inverse 
problems have been proposed (?; ?; ?; ?; ?). 

1.3 Convergence of the posterior distribution 

Mathematical analysis of inverse problems usually takes the form of asymp- 
totic arguments concerning how well the true solution (the value of x assumed 
to generate the data) can be recovered in the presence of noise, as the size 
of that noise goes to zero. In a statistical setting, the noise is a random vari- 
able, its size might be the variance, and we are concerned with convergence 
of random variables or their distributions - in the case of a Bayesian analysis, 
the focus is on the posterior distribution of x. 
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In this paper, we present the rates of convergence of the posterior distribu- 
tion on a finite-dimensional parameter space for an ill-posed inverse problem 
where the distribution of errors is not necessarily Gaussian. We also consider 
a particular case where the regularised solution is on the boundary of the pa- 
rameter space. As we shall see, in the case of an ill-posed inverse problem, 
the choice of the prior distribution strongly influences the limit of the poste- 
rior distribution as well as the rate of convergence on the subspace where the 
likelihood is not identified. Also, we will show that the rate of convergence 
may change if the limiting point x* lies on the boundary of the parameter 
space for a constrained inverse problem (for a Gaussian noise and a Gaussian 
prior, this problem has been studied by ?). We shall identify the assumptions 
on the posterior distribution necessary for convergence which can be used as 
a guidance to narrow down the set of potential prior distributions. 

There are different approaches to quantify the convergence rates of the 
posterior distribution. One of them is to consider the concentration rate of 
the almost sure convergence of the posterior distribution which is the smallest 
e a such that 

F(d(x, x*) > e<j | Y) — >• almost surely 

as the noise level a goes to 0, considered by ?. 

Another approach, considered by ?; ? in the context of linear inverse 
problems, is to metrise weak convergence of the posterior distribution as 
a random variable /x pos t(w) — p{x\Y(oj)) using the Ky Fan metric (?); see 
Section [3J This type of convergence is weaker than almost sure convergence, 
and the convergence rates in this metric are slower than the parametric rate 
with the mean square error loss. In particular, there is an extra logarithm 
factor in the rate which is unavoidable. In particular, the Ky Fan rate of 
convergence e a satisfies, with probability at least 1 — Pk(Y, 2/exact)> 

F(d(x,x*) < e a | Y) > 1 - £ CT on {u : d(Y(u),y BXact ) < p K (Y, y exSLCt )}, 

where Pk(Y, y exSuCt ) is the Ky Fan distance between the data Y and its small 
noise limit y ex &ct- This allows to have a non- asymptotic framework for the 
study of convergence of the posterior distribution. 

The setting for ? is the Gaussian linear inverse problem in the form 
02]), with a particular quadratic penalty (Gaussian prior). Their main result 
(Theorem 11) provides an upper bound on the Ky Fan metric between the 
posterior distribution and its (degenerate) limit, as an explicit function of 
the size of the noise, the parameters of the model and prior, and quantities 
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relating the prior mean to the null space of the matrix A. This result is used 
to prove a limit theorem (Theorem 13) on the convergence of this Ky Fan 
metric to 0, in a small- noise, high-prior-precision limit, and to give the rate 
of this convergence (Theorem 15). 

In this paper we consider two asymptotic properties of the posterior distri- 
bution in the small noise limit: we identify the limit of the posterior distribu- 
tion and state the rate of convergence in Ky Fan metric. As an intermediate 
step in deriving the Ky Fan rate of convergence, we have an upper bound on 
the Prokhorov metric between the posterior distribution and its limit, that 
metrises weak convergence. This bound is very simple and it allows to make 
conclusions about the sufficient conditions for weak convergence. We consider 
a broad class of probability distributions for the data, that we call generalised 
linear inverse problems, allowing the likelihood to be unidentifiable, and a 
broad class of prior distributions. 

We will also study the asymptotics of the posterior distribution in a par- 
ticular case where the exact solution lies on the boundary of the parameter 
space. This is the case of so called nonregular likelihood since the error 
density has a jump when the value of the parameter coincides with the ex- 
act solution. Other examples of the behaviour of the posterior distribution 
for nonregular likelihoods, including densities with jumps as well as other 
nonregular models, were considered by ? and ? who extended the models 
studied by ? in the frequentist setting. We consider a particular case where 
all coordinates of the exact solution are on the boundary, and show that the 
rate of convergence of the posterior distribution can be faster than for the 
regular models. 

Section [2] establishes the class of models we study. In Section [3] we discuss 
the Ky Fan distance and present some examples of calculating the Ky Fan 
distance for various error distributions. In Section H] we formulate our theo- 
rems on rates of convergence of the posterior distribution. In Section [5] we 
study an inverse problem where the limit of the posterior distribution (the 
regularised solution) is situated on the boundary. The proofs are deferred to 
the Appendix. 
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2 Model formulation 



2.1 Generalised linear inverse problems (GLIP) 

We assume that the joint density of the observable responses Y taking values 
in y C W 1 (with respect to Lebesgue or counting measure) takes the form 



that is, that the distribution depends on x G X only via Ax, where r is a 
scalar dispersion parameter; in the Gaussian model, r is the variance a 2 . The 
observed data y are generated from this distribution, with x = x tTUC , and we 
aim to recover x trU c as r — > 0. 

We assume a continuous bijective link function G : y — > M. n and write 
G(y cxact ) = Ax truc . (In generalised linear models - see Example 3 below - 
commonly G has identical component functions.) 

We make the following assumptions about the error distribution: 

1. If Y ~ F(y, G(y cxact ), r), then Y -> y cxact as r ->■ 0. 

2. For all /i G G^ 1 (AX), f^ {rj) has a unique minimum over AX at 
V = G{fi ). 

Assumption (i) states that r is not only the dispersion parameter in the 
model but also a scale parameter for the distribution of Y. Assumption (ii) 
establishes identifiability of the likelihood with respect to the link parameter 
T) = Ax. 

More generally, Assumption 1 is satisfied by generalised linear models ?, 
an important class of nonlinear statistical regression problems, responses y t , 
t = 1,2, . . .,n are drawn independently from a one-parameter exponential 
family of distributions in canonical form, with density or probability function 



for appropriate functions b, c and d characterising the particular distribu- 
tion family. The parameter r is a common dispersion parameter shared by 
all responses. The expectation of this distribution is E(y t ; fi t , t) — fit — 
d '(fit) /b '{fit)- Both assumptions are satisfied for this example. 




(3) 
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2.2 Bayesian formulation of GLIP 

We adopt a Bayesian paradigm, using a prior distribution with density given 
by 

p(x) oc exp(-#(:r)/7 2 ), x G X C MP, (4) 

where 7 2 is a scalar dispersion parameter for the prior that may depend 
on r; we relate this to the data dispersion parameter r by 7 2 = r/V, and 
express most of our results below in terms of r and v. Set of possible values 
of the parameters X can be any subset of W that contains a nonempty 
neighbourhood of x*. Therefore, the posterior distribution satisfies 

p(x\y) oc exp(-[f y (Ax) + v g(x)]/r), x E X, (5) 

Denote f y (x) = f y (Ax) and h y (x) = f y (x) + vg(x), so that p(x\y) oc 
e -h y (x)/T_ 

We will show that in the limit r — > 0, the posterior distribution concen- 
trates at point x* defined by 

x* = arg min g(x). 

Ax — Ax t r uc 

Below we make further assumptions on the likelihood and the prior dis- 
tribution that we apply to study convergence of the posterior distribution. 

3 Types of convergence and corresponding 
distances 

Convergence in distribution (weak convergence) can be metrised by Prokhorov 
metric (?). 

Definition 1. The Prokhorov metric between two measures on a metric space 
(X, d x ) is defined by 

pp(/ii,/i 2 ) = inf{e > : ^(B) < (i 2 (B E ) + £ V Borel B} 

where B e = {x : inf zeB dx(x, z) < e}. 
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This metric can be used to study the weak convergence of the posterior 
distribution /i pos t(w) = Px|y(w) as a measure on X to its limit for a fixed data 
set Y(u). We consider the Euclidean metric d(x, z) = \\x — z\\ on X. 

To study a weak convergence of the posterior distribution to its limit over 
all u, we can use Ky Fan metric that metrised convergence in probability (?). 

Definition 2. The Ky Fan metric between two random variables £x and £ 2 
in a metric space (W, dw) is defined by 

p K (6,6) = mf{e > : F(d w (U") , U")) > e) < e}. 

Hence, weak convergence of the posterior distribution /i post (as a random 
variable) to 5 X *, the point mass at x*, is equivalent to its convergence in the 
Ky Fan metric, where the metric space (W, dyj) is a space of probability 
distributions on X equipped with the Prokhorov metric. 

Now we give the Ky Fan distance or its upper bound for some distribu- 
tions. 

For the Gaussian distribution, we quote Lemma 7 from ?. 
Lemma 1. Let ^ ~ M p (fi, S). Define 

/2vr/(p+l) 2 if pis odd, 

U P ~ 1 op/ 2 f ■ W 

I 2 p /p if p is even. 

k p = max{l,p — 2} (7) 

Then there exists a positive constant 6{p) such that for any S: ||E|] < 

pp(AT(/i,S),^)< H|£||log{C p ||£|r}) 1/2 . (8) 
In particular, we will use the following bound on the solution z = z(p, A) 

of 

z{p,a) = mi{z: 1 - Y ( ^- I < z}, 
given in the proof of this lemma for sufficiently small a: 

z(p,a) < [-alog(C p a^)] 1/2 . (9) 
Here r(x|a) is the cumulative distribution function of the Gamma distribu- 



tion T(a, 1) with probability density function f(x) = frrx 0, 1 e x , x > 0. 



Now we consider a rescaled Poisson distribution. 
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Lemma 2. Consider independent random variables Y t /r ~ Pois(p t / T ) , t = 
1, . . . , n, p t > 0. Denote M = 4 ^ t 
Then, 

Pk(Y, p) = v / -rMlog(rM)(l + u> T ), 
where w T = o(l) as r — >■ and iu T ^ 0. 

Note that the Ky Fan distance has the same asymptotic order as for the 
Gaussian distribution with S = t£ , S is independent of r, as r — >■ 0. 

Now, if we consider the exponential distribution with variance propor- 
tional to t, the order of the Ky Fan distance is different. Let Y — \l ~ 
Exp(A/r), then EY = // + r/A, Var(Y) = r 2 /A 2 . As r -»■ 0, Y ->• /x in 
probability. The Ky Fan distance is given by 

Pk(Y, p) = ~t log Q (1 + tw T ), 

where w T ^ and u> T = o(l) as r — >• 0. This follows from Lemma [51 

Now we give some general statements on an upper bound on the Ky Fan 
distance for various distributions. 

Proposition 1. Assume that Y t are independent, EY t = /j t and Var(Y t ) = 

W t T. 

1. Assume that 3Ct ^ 1 such that Kt,k, the kth cumulant ofY t , is bounded 
by \^t,k\ ^ CtWtT^ 1 Wk > 2 and C t and w t are independent of r. 
Denote M = 4^(7^. 

Then, forr < l/(eM), 

p K {Y,p) ^ y/-rMlog{rM). 

2. Assume that 3K ^ 2: ~E\Y t \ K < oo. Assume that E\Y t - fi t \ K < 
T m ( K ^LK for some > that may depend on p t or w t but not on r, 
for some m(K) > 0. 

Then, for small enough r, 

p K (Y,fi) <: [nr^Ijf] 1 /^). 

The conditions in the first case are satisfied, for example, for the binomial 
distribution Y t ~ Bin(n t ,Pt), independently, since c t (x) = n t \og{p t e x + q t ) ^ 
n t p t (e x - 1). 

Here is an example for the second case. 
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Example 1. Suppose Y t has a t distribution with v degrees of freedom, means 
fit and scales y/rwt, t — 1, . . . , n. Then we can take K = v — 2 — 5 for some 
small 5 > 0. Then, using the second statement of Proposition^ 

E\Y t - fi t \ K = [VTw t ] K u K , 

where vk is the Kth moment of the standard t v distribution, i.e. m(K) = 
K/2 and L K = wfv K . Hence, 

Note that this bound holds if Y t can be written as Y t = fi t + o~w t Z t where 
Z t are iid and whose distribution is independent of r. 

4 Rates of convergence of posterior distribu- 
tion in Ky Fan metric 

Denote by /i pos t(w) the posterior distribution of X given y = Y(u). We 
consider the metric space (X,£ 2 ) equipped with the Euclidean metric \\x — 
z\ \ = \fY7i=i( x i ~ z i) 2 i X c Then, the posterior measure /^ p0 st(^) can be 
viewed as a measure on the metric space (X,£ 2 ). The corresponding metric 
space for the observations is (y, £ 2 ), y C M n equipped with metric generated 
by £ 2 norm. 

In the next section we evaluate the level of concentration of the posterior 
distribution /i pos t around x*. We start with the concentration of the posterior 
distribution fi pos t(ou) for a fixed uj (i.e. for a particular data set) in the 
Prokhorov metric, and then, using the lifting theorem (Theorem [2]), we use 
bounds thus obtained to derive a bound on the Ky Fan distance between 
the posterior distribution and the limit over all uj. In the results below, it is 
assumed that the dimension p is fixed and is independent of r. 

Throughout the section, we assume that x* is an interior point of X . 

4.1 Assumptions on the likelihood and the prior 

We will make the two main assumptions that the posterior distribution is 
proper and that the log likelihood and log prior density have bounded third 
order derivatives. 
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Throughout, we use V* = -£r as the differentiating operator, and V = 
(Vi, . . . , V P ) T as the gradient. Similarly, V^- and Vijk are operators of the 
second and third derivatives, with V 2 = (Vy) being the matrix of second 
derivatives. 

Assumptions on prior distribution. 

We assume that the prior distribution is such that the posterior distribu- 
tion is proper. 

1. 3r > 0: Vr ^ r , f x e~ h ^l T dx < oo for all y G y. 

2. x* = aigmm x£X Ax=Axtmc d( x ) * s a unique solution of the minimisation 
problem. 

Smoothness in x. 

There exists 5 > such that there exist bounded third order derivatives 
3fy, 3g"' on B(x*,5) for all y G 34>c, i-e. 3C/ j3 , C g>3 < oo such that for all 
x G B(x*, 5), for all y G 34>c and all 1 ^ k ^ p, 

\V ijk f y (x)\ < C ft3 , \V ijk g(x)\ ^ C g , 3 , (10) 

where 3^i oc is the following neighbourhood of y exa ,ct in y~- 

^loc = {y G y : | \y - t/cxact 1 1 < PK (Y, t/ cx act) } (11) 

and Pk(^j 2/exact) is the Ky Fan distance between Y and y e xact- By the defi- 
nition of the Ky Fan distance, P(^ioc) > 1 — Pk(Y : y exact)- 
Convergence in Y. 

3Mf tl , Mj 2 < oo such that for all 1 ^ j ± , . . . , j d ^ p with d — 1, 2, and 
for all y G ^loc, 

\Vji,..jM x *) - V n ,..., jd f y ^ ct (x*)\ <: M fid \\y - y exact ||; (12) 

These assumptions are satisfied if *V d f ll0 (x) is differentiable in /x for d = 
1, 2 and this derivative is bounded on y\ oc , with 

M ftd = sup |V v V^/ v (ar*)| for d=l,2. 
Assumptions on 5. 

Assume that 5 > satisfies the following conditions as r — > 0: 
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1. 

6^0, 4=^0, ^p K (Y,y^) + u, S [PK{Y > y « xact) + u]2 -+0,(13) 

V T T 

oo (not necessary if A T A is of full rank). 

2. With high probability, 

A (B(0,5)) -»■ as r-+0, (14) 

where 

f e -[^(a;)-/i H (a;*)]/r^ x 

«D = W-^^-fa • (15) 

After the approximation to e~^ hv ^~ hy ^ x *"' T on B(x*,8) is derived, condition 
(j!4j) will be stated in a simplified form in Lemma [3J 

Throughout this section we use the error A = A (B(0, 5)) defined by 
f fT5|) . and constants k p and C p defined by (jSJ) that feature in the upper 
bound on the Ky Fan metric between the Gaussian distribution and its mean 
(Lemma [1] in the Appendix). 

4.2 Ky Fan distance 

The limiting behaviour of the posterior distribution is characterised by the 
matrices of second derivatives: 

V y {x) = V 2 f y (Ax), 

B(x) = V 2 g(x), 

H y (x) = V 2 h y (x) = A T V y (x)A + vB(x), 
and by the gradient: 

X = [HyiX^Vhyix*). (16) 

Denote a projection matrix on the image of A T by Pat, and Pa,v — (A T VA) j; A T VA. 

Define A m i npos (M) to be the minimum positive eigenvalue of a matrix 
M, and A m i ni p(M) = mini^i^^p^,, \ \Mv || to be the smallest eigenvalue of a 
matrix M on the range of a projection matrix P. 

For a fixed u, we have the following upper bound on the Prokhorov dis- 
tance between the posterior distribution and its limit. 
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Theorem 1. Suppose we have a Bayesian model given in Section \2. 11 and 
let the assumptions stated in Section \4-l\ hold. Assume also that x* is an 
interior point of X , and that Hy( u ){x*) = A t Vy(uj)(x*)A + uB(x*) is of full 
rank. 

Then, 3r > such that for Vr E (0, To], 

/ , \ x \ ^ J A o M fl \\Y(u)- ycxact \\ + u\\P A T Vg(x*)\\ 
pp (//post (w , 6 X *) ^ max t , T - f - — —- 



w/iere A min (w) = A min (iJy( w )(x*)) ; A = A (-B(0, 5)) defined by |73]] and A, 
is defined by 



The first term in the sum represents the bias of the posterior distribution, 
and the second term is the Prokhorov distance between Af(0, rHy^ix*)" 1 ) 
and the point mass at zero. The maximum reflects the fact that there are two 
"competing" tails: Gaussian on the ball B(x*, 5) and the tail of the posterior 
distribution outside the ball. 

This theorem implies that to have convergence of the posterior distribu- 

p 

tion to 5 X ± , we must have (a) convergence of the data so that | \Y — y eyLact 1 1 
0, (b) v = r/7 2 -> 0, i.e. the prior distribution needs to be rescaled in a way 
dependent on the scale of the likelihood, and (c) r / \ min (H Y (ui)(x*)) — » 0. If 
the matrix A t Vy(ui)(x*)A is of full rank, then, for small r, A min (ify( w )(x*)) 
is close to the constant X min (A T V ye ^ ct (x*)A) with high probability, hence the 
latter condition is satisfied as r — > 0. However, if A t Vy(ui)(x*)A is not of full 
rank, then, for small enough v and r, Amin (i?y( w ) (a;*)) = ^A m i n> i_p AT (B(x*)); 
hence, we must have t/v = / ~f 2 — > 0. 

This is summarised in the following corollary. 

Corollary 1. For weak convergence of the posterior distribution to the point 
mass at x* as r — > for a fixed u, we must have v = r/7 2 — > 0. 

1. If the matrix A t Vy(ui)(x*)A is not of full rank, then we must also have 
7^0. 

2. If the matrix A T VY( U i){x' k )A is of full rank, however, the scale of the 
prior distribution 7 may be taken a positive constant. 

The theorem also implies that the rate of contraction of the posterior 
distribution (in terms of Ky Fan distance) varies between PatX and (J — 



13 



P a t)X and is determined by the second derivative of the logarithm of the 
posterior density. 

This theorem gives an upper bound on the Prokhorov distance between 
the posterior distribution and the limit for any particular instance of observed 
data Y(lu). To "lift" the result obtained to a bound on the Ky Fan distance 
over all u, we use the following generalisation of the lifting theorem of ? to 
the case of different bounds for different outcomes u. 

Theorem 2. Let random variables X\, X 2 and Y\, Y 2 be defined on the same 
probability space (fi, J 7 , P) with values in metric spaces (X,d x ) and (Y,d y ), 
respectively, and suppose the sample space Q is partitioned into two parts, 

fi = fliU tt 2 , fii n tt 2 = 0. 

Assume that there exist positive nondecreasing functions $i and $2-' 

\Juen k , d x {X 1 {Lo),X 2 {u;))^$ k {d y {Y 1 {uj),Y 2 {uj))), k = 1,2 

i. e. we have different upper bounds on Vti and Q 2 . 
Then, the following inequalities hold: 

MXi, X 2 ) ^ maxK(Yi,Y 2 )+P(n 2 ),$i( / o K (Yi,Y 2 ))}, 
Pk(Xi,X 2 ) ^ max{p K (Y 1 ,Y 2 ),$ 1 (p K (Y 1 ,Y 2 )),$ 2 (p K (Y 1 ,Y 2 ))}. 

In our case, (X, d x ) is the space of all distributions equipped with the 
Prokhorov metric, and (Y, d y ) is the metric space y with the l 2 metric. 
Theorem [1] provides an upper bound $i on the event fli where a random 
matrix Hy(uj){x*) is of full rank, and the first statement of the theorem is 
applied to obtain the Ky Fan rate of convergence. Note that we do not need 
an upper bound $2 to bound the Ky Fan distance on Q 2 , as long as P(f2 2 ) is 
vanishingly small as r — > 0. 

Denote 

v min = min V yexacttt {x*), 

t:V Vc^t tt(x *)>0 

Mfi _ \\P A TVg(x*)\\ 

C l _ ^ 1 AT A\ C 2 



^min-^minjpos (-A ^4) ^min-^min,pos(-4- -4) 

and, for small enough Pk(Y, y exajCt ) and S, 
M f2 p K (Y,y ex£kCt ) n ~' 



Cfc = c k 



^min^min pos (-4 -A) _ 

where \dh is a constant defined by 



[1 - <VpAdh]-\ fc = 1,2, (19) 
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Theorem 3. Suppose we have the Bayesian model defined in Section \2.1\ 
and that the assumptions on f y , g and 5 stated in Section f^TT] hold. 
Assume that 

1. x* is an interior point of X , 

2. H u = A T V yc ^ ct (x*)A + uB(x*) is of full rank, 

Then, 3tq > such that for Vr G (0, To], and small enough v and r jv, 

Pi<c(/V>stA*) ^ max |2p K (Y,?/ exac;t ), , cip K (y, y exact ) + c 2 v (20) 

I J- + 



+ 



7 log[C p 



Amin 



1/2 

(1 + A^ K (S)) 



where c\ and c 2 defined by ( TTgj) . A = A (I?(0,<5)) is given by (2l\) . and 
A*^^) is defined by (34\ ). 

Under the assumptions on t, v and 5, A* 5 a'(<5) = as T ~ * 0- 

Recall that in the ill-posed case (if A T V yexsiet (x*)A is not of full rank), 
Amin x v ■ const, and in the well-posed case A m j n x const. Thus, we have the 
following corollary. 



Corollary 2. Suppose that Pk(Y, ?/exact) ^ Cy/— r logr for some constant C, 
and that the assumptions of Theorem\^are satisfied, and that is smaller 
than the other terms in the maximum. 

If A T Vy cxax . t (x*)A is of full rank (well-posed problem), the smallest upper 
bound on the Ky Fan distance is 



PK(/V>stA*) ^ Ci (-Tlogr) 



1/2 



with 7 2 > r^f-logr]- 1 / 4 . 

If A T Vy e ^ ct (x*)A is not of full rank (ill-posed problem), the smallest upper 
bound on the Ky Fan distance is 



PK(/VstA*) ^ C 2 (-rlogr 



,1/3 



with 7 2 = r^-logr]- 1 / 6 . 



15 



The assumption of corollary Pk(X? 2/exact) ^ Cy/—r log r is satisfied for 
Gaussian random variables Y. In Section [3] we saw that it is also satisfied 
for such distributions as rescaled Poisson and binomial distributions. 

Consider the case of the rescaled Poisson distribution, and a linear inverse 
problem with the identity link. 

Example 2. Rescaled Poisson random variables satisfy the assumptions of 
GLIP since it belongs to the exponential family, EY^ = Var(Yi) = r/i i; 
and Y — > // as r — > 0. 

Since x* is an interior point of X , then, for small enough a and 7, 

Ci^-rlogr + Ca^ 

7 



+ C 3>a r^~ a ^ 2 -f a ^- log (r^-^/V) 



1 T 1 

constants are given by 



where a = z/ A T V^, xact (x*)74 zs of full rank and a = 1 otherwise, and the 



Ci = 2||y cxact ||i /2 max 



A MM\y^^\\oo \ 

(am); 



^ 2 = if}? J \PaV 9 (x*)\\, 

^mii^pos *\ ) 

C 3 , a = (K p [(l-a)\ min (A T A) + a\ min j-p A (B(x*" Y < ii2 



If a = (well-posed problem), the fastest rate is o\/ — log cr, with 7 > 
a 1 ^- i g ^-1/4. 

Ifa = l (ill-posed problem) andr = a 2 , the fastest rate is a 2 / 3 [— log cr] 1 ^ 3 , 
with 7 = a 2/3 [-log(x]- 1/6 . 

4.3 Choice of ^ 

Now, we discuss how to choose 5 in such a way that 

e -(h y (x)-h y (^))/r dx = ^ + Q ^ /" ^(M^-M**))/^ 
A 1 JB{x*,6) 

with high probability as t — > 0, i.e. that the condition ffT4]) A (-B(0,<5)) — >■ 
as r — > is satisfied with high probability. 
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We introduce the following additional notation. Diagonalise the projec- 
tion matrices P a t and I—P a t simultaneously, so that P a t = U T dia.g(I po , P1 )U, 
I — P a t = £7 T diag(0 Po , I Pl )U and U T U = I p , where po = rank(v4) and 
Pi=p- Po- 

B n = U^gix^U,. 
First we consider the integral of e~ hy ( x '/ r over B(x*,5). 

Lemma 3. Assume that Q 00 and B>u are of full rank. Under the assumptions 
on fy> 9 an d assumption (i) on 5 stated in SectionU~l_ , 



f „-\hv(x)-hjx*)]/Tj pn/2..pi (2n) p/2 e x o Hx °/ ( - 2 ^ 

JB(x*,S) 

In particular, this implies that 



1 [det(fi o)det( J B 11 )] 1 / 2[ 1 11 



A {B(0,6)) = C H T- po/2 j- pl [ e- Mx) -^ x * )]/T dx[l + op(i)pi) 

Jx\B(x*,S) 

where C H = (27r)-P/ 2 [det(fi 00 ) det(5 11 )] 1 / 2 e~ x o Hx /{2r) _ 

See Proposition |2l in the Appendix, for further details and the proof. 



5 Convergence rate when x* is on the bound- 
ary 

In this section we consider a special case where the assumption that x* is an 
interior point of X does not hold. This is an example of so called nonregular 
models that have been considered mostly for a one-dimensional nonregular 
parameter (ref), and, as far as we are aware, have not been considered in 
the context of inverse problems. As we shall see, the rate of convergence is 
different in this case. We shall see that for some probability distributions, 
it makes it possible to observe exact data under the considered probabilistic 
model (Section [2]). 

In this section we assume that the parameter space is X = [0, oo) p , and 
that each coordinate of x* is on the boundary of X = [0, oo) p , i.e. x* = 0. 
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This is an important benchmark case where there is no signal. Such setup 
arises, for example, in image analysis, where x is the vector of the unknown 
intensities, and we want to test whether there is any image present. We could 
assume that parameter x is restricted to an arbitrary convex polyhedron; this 
could be reduced to [0, oo) p by a linear change of variables. 

5.1 Assumptions 

We make the same assumptions on the prior distribution as in Section I4.1[ 
however, we only need the smoothness and the convergence assumptions for 
up to the second derivative only, rather than up to the third. Assumptions 
on 5 - the radius of approximation - are also changed. 
Smoothness in x. 

There exists 5 > such that there exist bounded second order derivatives 
3/^, 3g" on B(x*,S) for all y G 3^ioc, i-e. 3Cj 2 , C g ^ < oo such that for all 
x G B(x*, 5), for all y G 34> C) 

max \Vijf y (x)\ < Cf 2 , max \V i:i g(x)\ < C g>2 - (22) 
Convergence in Y. 

3Mf 1 < oo such that for all 1 ^ j ^ p and for all y G 34>c, 

\Vj v {Ax*) - Vjy^Ax^l ^ M f l \\y - y exact \\. (23) 
Assumptions on 5. 

Assume that 5 > satisfies the following conditions as r — > 0: 
1. 

x 

5^0, ~->-0, 

T 

(not necessary if A T A is of full rank). (24) 

2. With high probability, 

A (B(0,6)) as r -> 0, (25) 
where A (5(0,5)) is defined by (TTSTl 



— — > oo 
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5.2 Rate of convergence in Ky Fan distance 

Define 

b(u) = Vh Y ( u )(x*). 

Theorem 4. Suppose we have the Bayesian model defined in Section \2.1\ 
and let the assumptions on f y , g and 5 stated in Section [3T71 hold. 

Assume that x* = and that bi(u) > for all i, and denote b m i n (u) = 
miiij bi(u). 

Then, 3r > such that for V V 6 (0, r ] and small enough 7, 
pp(/i P ost(w),M < maxi - A ° , , log ( - prf— — : r J (1 + A 4 )l , 

[1 + Ao 6min(w) \y/pb min (UJ) J J 

where A = A (5(0, <5)) zs defined by l[T5\) and A 4 (<5, K (w)) zs defined by (3l\ ). 

Recall that = A T V/ y(aj) (x Vr ) + zA7#(x*). Thus, if the image of A 
includes the whole set X (well-posed case), the leading term of b{uS) for 
each coordinate is a constant, then the rate of convergence is determined by 
— rlogr. However, if rank(v4) < p (ill-posed case), then for some coordinates 
the leading term of b(u) is v const — > 0, then the rate of convergence is 
determined by — 7 2 log7. 

To have consistency in the ill-posed case, we must have rjv = 7 2 — > 0. 
Hence, in this case to have the convergence we must assume that v = r/7 2 — >■ 
and 7 — > as r — > 0. 

Now we apply Theorems [2] and H] to obtain an upper bound on the Ky 
Fan distance. Define 

b* = Vh y _Jx*). 

Theorem 5. Consider the Bayesian model defined in Section [Ql and sup- 
pose that the assumptions on f y and g stated in Section I5.il hold. 

Assume that X = [0, oo) p , x* = 0, V^ cxact (G(?/ exact )) > and b\ > for 
all i. Denote b* min = mim, b*. If rank {A 7 A) < p, assume also that 7 — )• and 
r/7 2 -» as r ->■ 0. 

Then, for small enough t, 7 and v, 

PKGupostA*) < max I 2p K (Y, y exSLCt ), Ag, log ( ^ ) (1 + Ag) j 
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where Aq is defined by l[21\) . and 



^11 



log 



^11 — 71 PK{I, 2/exactJ + — 



A* 



log((l + At)/(l + Ag)) 
log (v^^Jl-Anl/r) 



A* = 1 | Hi All ^ P [l c -max I fe*(l+A 11 )^/U/pr)jP^ 

Under the assumptions on r, 7 and 5 gu>en m Section [3T71 Ag(5) = o(l) 
as r — )• 0. 

Hence, in the case that the solution is on the boundary, we have a differ- 
ent rate of convergence of the posterior distribution that is faster than the 
corresponding rate in the case the solution is an interior point. This fits with 
other studies of the rate of convergence of the posterior distribution for the 
error densities with jump (?; ?). 

Examples. 

1. Rescaled Poisson distribution Y t /r ~ Pois(A t x/r), independent. For 
x* = 0, we have P(l^ = 0) = 1 for all t. The Ky Fan distance between 
the data and its limit is zero, so we observe exact data. In this case, 
we can recover P^tx exactly. 

If A T A is of full rank, the Ky Fan distance and we recover x* exactly. 
If A T A is not of full rank, the upper bound is of order — 7 2 log (7 2 ) and 
can be arbitrarily small. This rate is faster than the rate in the case 
x* is an interior point. 

2. Exponential error distribution: Y t — A t x ~ Exp(At/r), independent. 
For x* = 0, we have Y t ~ Exp(A t /r). In the well-posed case, the Ky 
Fan distance between the data and its limit is — A^rlogr, i.e. is of 
the same order as the rate of contraction of the posterior distribution 
to its maximum, where A# is a function of Ai, . . . , A n . In the ill-posed 
case, the dominating rate is of order — log (7 2 ) which is faster than 
the corresponding rate when x* G int(X). 
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A comprehensive study of the rate of convergence of inverse problems 
under a more general setting (when x* is an arbitrary point on the boundary) 
is beyond the scope of this paper and is current work in progress. 
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.1 Proofs of the results in Section |4] 

Lemma 4. Denote H = A T V y (x*)A + isB(x*), k a = |C /3 , k b = |C g3 . 

Assume that H is invertible, and that x* is an interior point of X . 

Let x G B$ = {x G X : ||a; — x*\ | < 5}, and denote v = (x — x*) / y/r. 

1. Upper bound. Then, for small enough 5 and v, we have the following 
upper bound: 

[h y {x)-h y {x k )]/r < \\H l /\v-H- l Hx Q /^)\\ 2 /2 

[Mfi | \y - y cxact 1 1 + v\ \P AT Vg{x*) | |] 2 



+ 



T[\ minpos {A T V y (x*)A) - 6k a ] 



where D = k a P a t + vn^I and H = H + 5^/pD. 

2. Lower bound. For small enough 5 and v, we have the following 
lower bound: 

[h y (x)-h y (x*)]/T > WH^v-H-'Hxo/^W 2 ^ 

[M fl \\y - y cxact | | - v\ \P a t Vg(x*) 1 1] 2 



+ 



r[X minpos (A T V y (x*)A) + 5k a + v\ min ,p AT (B)] ' 
where H = H — 5^/pD . 

Proof. Approximate h y (x) by a quadratic function using Taylor decomposi- 
tion in a neighbourhood of x*: 

»,(«) = wo + ivwxTi,-/)^,-^-, *) + a ooW . 

Bound A o for w = (x — x*) G B>s using Taylor decomposition of h y (x): 
3x c G (x, x*): 

A 00 (<5) = - ^ Vijkh y (x c )(xi - x*)(xj - xf){x k - x\) 



ilk 



= I £c* - ^ l> - *r^h y {z){x - x*) Ja _, 

i 

Note that 

x*) T V 2 /i y (z)(a;-x*) = (a;-x ,t ) T P4TV 2 / y (z)P4T(x-^) + z/(a;-x ,t ) T V 2 ^(z)(x 
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Differentiating with respect to z and bounding the third derivatives of f y 
and g using the Smoothness Assumption, we have that for every i, with high 
probability, 

(x - x*) T P A TW i V 2 f y {z)P AT {x -x*)\ < C f3 \\P AT (x -x*)\\l< P C f3 \\P AT (x - x*)\\l, 
and, similarly, 

|(x-x*) T Vi V 2 g(z)(x - x*)\ < Cg 3 \\x-x*\\l<pC g3 \\x-x*\\l. 
Applying these inequalities together, we have 

|A 00 (5)| < — ||rc -x*\\imsx\(x - x*) T V iV 2 h y (z)(x - x*)\ 

< ^( X - X *f( pCf3 p AT+upCg3 j^ x _ x *y 

1. The upper bound. Making the change of variables v = (x — x*)/y/r, 
we have 

[h y (x) - h v (x*)}/r < ~v T x /V^+^v T {H + 5^D)v 

= ~(v - H- l Hx Q /y/r) T H{v - H- l Hx Q /y/r) - — \\H- 1/2 Hx \\ 2 . 
2 It 

2. The lower bound. A similar argument leads to the following lower 
bound: 

\h y (x) - h y {x*)]/r > -v T x Q /^+-v T {H -5y/pD)v 

= hv - H- l Hx Q /^) T H{v - H-'Hxo/V^) - ^-\\H- 1/2 Hx Q \\ 2 . 
I It 

□ 

Proposition 2. Let assumptions on f y , g in Section \4-l\ and assumptions 
(Q2|) on 5 hold. Assume that H = A T V y (x*)A + vB(x*) is of full rank, and 
that 7 — > and v — > as r — > 0. 

Then, for any e € (ciPk(Y, j/ e xact) + c^, 5) such that e/j — > oo, 



f p- h y( x )/ T f J r 
JX\B(x*,e) C " A < 

f x e~ h y^l T dx 



1 -r 



\^{hi)\e-\\E-^Ex^\\\ 2 p 
2t 1 2 
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1 + A 2 A 



+ 



1 + A 1 + A 



and, in particular, 

e -[hy{x)-h y (x*)]/T ^ y 



T 



B(x*,S) [det(#)] 1/2 
where A2 is defined by 



[ 2vr ]P/2 exp <( -5 ° J> [1 + A s 



2r 



A 2 (<y,2/) 



r 



2t 





-1 


1 p ^ 




V. 





1 + dy/pXnp 
1 - 5y/pX H D 



TP/2 



[M/iHy-ye^tH+HI^V^)] 2 



X exp < (/C^ + Z/K B ) 



-1 + 



X max (H)(6 



2t 



(,26) 



(27) 



Here Xhd = X min (HD ) = min 



A min ,P A v ,(A T y !/ (z*)A+i/B(2:*)) A min ,i_ P (B(:r*)) 



Proof of Proposition [B Making the change of variables v = (x—x*) / y/r with 
Jacobian J = r p / 2 and applying Lemmas H] and we have 



/ e -[h«W-hy exact (x)]/T dx > r P/2 exp r I \H-V 2 HXo\\ 2 /(2T)} 

JB(x*,S) [ - ' 



x / exp 



> T^expjll/r^lP/pr)} [2 T ]>" 2 [det(5)]-^r ^ ^(gM±lg^££ll]! | |j . 

In particular, we have the statement of Lemma [3j 
f e -[hy(x)-h v (x*))/T dx > r p/2 exp{x^M- 1 i/xo/(2r)) [27r] p/2 [det(iJ)]- 1/2 [l + A 3 " 

Jb(x+,S) J 

with A 3 defined by 

Aafty) = _ 1+r f Wg)[^+Hg- 1 ^oll] 2 | | 
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The error A 3 — > as r — > 0, since X min (H)S 2 /r — > oo with F VeKaiCt probability 
-> 1. 

Similarly, we can obtain an upper bound on this integral: 



Ib(x 



(x*,S)\B(x*,e) 
X 



-[h y (x)-h v (x*)]/T dx < rP /2 exp |||^l/2 Fxo ||2 /(2r) | 



exp 



£/V?<lkll<5/V? 



ill^/^-r- 1 / 2 ^- 1 ^)!! 2 ^^- 



Assume that 5 is small enough so that H is positive definite. 

Combining these results together, we have that for e > \\H~ 1 Hxq\ 



B(x* ,S)\B(x* ,e) 



-hy(x)/T^ x 



lB(x*, S ) e ' KiX)/Tdx 



< 



i-r 



X^H^e-WH^HxqW} 2 p 

1 2 



x 



X ^H^S+WH-'HxqW] 2 p 
2t 1 2 



2r 

i -1 r ~ -i 1/2 
det(#) 



det(H) 



X 



exp{(5 v / p(V/ij / (x*)) T J ff- 1 M- 1 V/i y (x*)/r 



since 



H^-H- 1 = H- l (H -H)H- 1 = 25^pH- 1 DH- 1 
The ratio of the determinants can be bounded by 



det(#) _ &et{I + 5^pH- l D) f 1 + 5^pX max (DH 



det(H) det(I - d^pH^D) 
By Lemma El 

y {x*)) T H~ 1 DH- 1 Vhy{x k ) < (k a + uk b 

Thus, we have 



-i\ \ p 



1 - 5^\ max (DH-i) 



[M fl \\y - y exact \ \ + v\\P A Vg(x*)} 2 
X miD (H)[e- WH-'HxoW} 2 



B(x*,S)\B(x*,e) 



-h y (x)/r^ x 



B(x*,S) 



X 



-h y (x)/r^ x 



< 



i - r 



2r 



X max (H)(6+\\H- l Hx c 



-i -i 



2r 



V 
2 



1 + S^pX^DH- 1 ) 

Ll-^maxlM- 1 )] 



p/2 



[M fl \\y - yexact \\ +v\\P A Vg(x*)} 2 
r[X 2 mm (A T V y (x*)A)-5 2 pK 2 A } 
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x exp < 5^ (k a + vk b 



Now we take into account the error of approximating the integral over X 
by the integral over B(x*,e): 



X\B(x*,e) 



-h y (x)/T^ x 



Ib(x 



(x* ,6)\B(x* ,e) 



-h y (x)/r dx + J 



X\B(x*,5) 



- h y( x )/ T dx 



j x e-h y {x)/r ( l x 



fB(x*,6) e - hyiX)/Tdx + L 



X\B(x*,8) 



e -hy{x)/T(l x 



B(x* ,5)\B(x* ,e) 



-h y (x)/T^ x 



A 



{l+A )f B{xir)g) e- h y^dx 1 + Ao' 
Substituting the previous upper bound, we have the required statement. 



□ 



Proof of Theorem\J\ By Strassen's theorem, for any x, pp(/U post (co>), S x ) = 
Pk(£,x) where £ ~ p pos t(^)- Hence, we find an upper bound on the Ky Fan 
distance between £ and x*. 

Take e > \\xq\\. Using Proposition [21 we have an upper bound on e 
satisfies 



Ib(x* 



(x* ,S)\B(x* ,e) 



-hy{x)/T^ x 



L 



B(x*,5) 



-hy(x)/T(l X 



^ A 

< e, 



i - r 



[e- WH-'HxoWfX^jH) , P 

1 2 



2r 



where A = A /(l + A ) and A 2 = (1 + A 2 )/(l + A ) - 1. The last inequality 
implies that as r/A min (if) — > 0, e — > and e 2 \ min (H)/r — > oo. Hence, using 
Lemma HJ we have that 



e < \\H- 1 Hx \\ + 
By Lemma [HI 

Ikoll < 



\ 



(H) &1 " 



A min (i?)(l + A 2 )2 



M f i\\y - y cxact \ \ + vWPArVgix* 



Amin pos 

which we can substitute into the upper bound for e, and 

ll^- 1 ^!! = -d^H-'D)- 1 1 1 < [1 - Sy/p^iDH- 1 )]' 1 . 



:i+a 2 ) 
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Hence an upper bound on the Ky Fan distance is the smallest e > A > 
that satisfies the obtained upper bound. Therefore, the Ky Fan distance (and 
thus, the corresponding Prokhorov distance) is bounded from above by: 



(impost MA*) ^ max 



Ap M fl | | Y M ~ ?/exact 1 1 + V\ \P A? Vflfr*) \ \ 

1 + Ao -^min,pos 



+ 



Amin(w) 



log a 



A m inM 



(i + a^ym)) 



where A min (u;) = X min (H Y ^)(x' k )), A = A (B(0,8)) defined by ([15]) and A* 
is defined by 



A*(a,2/) 



1 + A c 
(1 + A, 



1 + 2 



log(l + A 2 ) - log(l + A ) 
log (A min (#)/r) 



1/2 



Proof of Theorem First we note that 

P { d x (X 1 (u),X 2 (uj)) ^ ^ 1 ( PK (Y 1 , y 2 )) n n x } 
+P { d x (X 1 (u),X 2 (u)) ^ $ 2 (p K (^i, y 2 )) n fi 2 } 

> p{$ 1 (d y (y 1 M,^M)) <*i(p K (5i,^ 2 ))nfi 1 } 

+p { $ 2 K(>1M, r 2 M)) < $ 2 (p K (Fi, y 2 )) n ft 2 } 
= p{d I ,(y 1 M,>2M) <p K (ii,i2)nni} 



+ 



,(yxM, ^M) < pk(vi ,y 2 ) n n 2 } 



= Pfd^yM^sM) ^pk(Yi,Y 2 )} 

> i- PK (Y 1) y 2 ). 

On the other hand, 

p { 4(ii(w),i 2 H) ^ $i(pk(Yi, y 2 )) n fii} 
+p { 4(^iM> ^M) < $ 2 (pk(y 1 , y 2 )) n fi 2 } 

< P { 4(Xx(a;), X 2 M) ^ $i(pk(Yi, Y 2 ))} + P { fi 2 } . 
Putting these together implies 

P{4(^iM>^2M) > $i(pk(Y,Y 2 ))} <p K (Yi,Y 2 )+P{fi 2 } 



(28) 
□ 
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hence, using Lemma El we have 

p K (X h X 2 ) ^ max {$i(p K (n, Y 2 )),p K (Y ll Y 2 ) + P(fi 2 )} 



and we have the first statement. The second statement follows from the first 
inequality and 



Proof of Theorem [3 Now we prove Theorem [3] in the notation defined in 
the proof of Theorem HJ 

We apply Theorem [2J with Q 1 = {u : \\Y (u) - y cxact \\ < p K (Y, y exact )} and 
Q 2 = ^ \ ^i with P(fi 2 ) < Pk(^; l/exact) by the definition of Ky Fan distance, 
with the bound $i given in Theorem [3] which we modify to depend on y only 
via \\y — 2/exactH- For small enough r, the assumption of the theorems that 
H = A T V y (x*)A + vB(x*) is of full rank holds on Qi, as we shall show below. 

The upper bound depends onyvia\\y-y cxact \\, X min (H y (x*)) , A min pos (A T V y (x*)A), 
A and A 2 . 

We start bounding the eigenvalues from below. Denote H u = Hy wgfSt (x*). 



Since \[H y (x*) - H y _ ct {x% t] \ = \[V 2 f y (x*) - V 2 f Vc ^)U < M f2 \\y - 



Z/exact||, on Qi we have, by Lemma HI 

A m ax(^) < A max (F ?/exact (a;*)) + M f2 p K (Y, y cxact ) + Sy/pX^DH" 1 ). 

Similarly, since A T V y (x)A = V 2 f y (x), 

^(Vyix^-V^x^AU < M f2 \\Y-y cxact \\ for all ij, 

hence A m ; n p OS (A T V Y (x*)A) > X minpos (A T V y _ ct (x*)A)-M f2 \\Y-y exact \\. The 
lower bound is positive for small enough r. 

We also need to bound A max (.Dif _1 ) from above, or equivalently, its in- 
verse from below, on f^, 



P { d x (X 1 (u),X 2 (u)) < Mpk(Yx, Y 2 )) n fii} 

+p { x 2 (w)) < ^ 2 ( P k(y u y 2 )) n fi 2 } 

< P{4(Xi(a;),X 2 (a;)) < max^^n,^)), $2(^1,^2))]}. 



□ 



X m in(HD ) 



> min 



mm 



<fe/ 



■min, Pa, v 



mm,pos 
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Also, on Qi'. 



\\H Hxq\\ < Ci\\Y - ?/exact|| + C 2 ^, 

where c k = c k [1 - M /2 p K (F, 2/cxact)/A mi npos(^ T K /cxact (x*)v4)]~ 1 for k = 1,2. 
Hence, on 



A 2 (<5,F(u;)) < -1 + exp 



/ (5p 3/2 (C / 3 + ^C s 3)[ciPK(y,2/exact) + C 2 ^] 



X 
X 

cfe/ 



3r[l - S^p/XU 

[A ma x(^) + M /2 p K (F,y cxact ) + 5 y/p\ min (D H' 1 )] [5 + c!p K (Y, y cxact ) + c£z/] 



1 + Sy/p/X DH 

1 - Sy/P/\ DH )_ 



p/2 



[i + a; 



*1-1 

oJ 



A* 

2 ' 



By Lemma [91 

det(if) < det(#„) 



M f2 p K {Y, 

J/cxact / 



rank(A T .4) 



Amin pos(A T V r 2/cxact (X*)A) _ 

Hence, Ao is bounded on f2i from above by 



A5(B(0,5)) 



exp {[2r] 1 [cii/ - c 2 p K {Y, y C xact)] 2 } 
[1 + A 22 ]rank(A-A)/ 2 [ det (^)]i/2 



[1 + Ai 



[2vr]P/ 2 



(31) 
(32) 



where A 22 = M /2 p K ('i / ',ycxact)/A m inpos(A T K /cxact (x" ; )A). A3 is a lower bound 
on A3 on Qi derived in a similar way: 



A3 



i + r 



Therefore, we have that, on f^, 

Ci\\Y -?/exact|| +C 2 U 



e < 



+ 



^ - Sy/P/XDH) 



A m in(l — A 22 ) 



log a 



A min (l- A 22 )(l+ A 



*^ 2 
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[A max (i^) + M f2 p K (Y, y cxact ) + Sy/p\ min (DH u x )] [5 + cip K (Y, 2/exact) + o%v\ 
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since the function — x log x increases for x < 1/e. 

The bound on e increases as a function of \\y — y e xact||- Using the lifting 
Theorem [2l we have that, for small enough r, u, 

0k(AW> S x *) < max {2p K (Y, y cxact ), A*, [cip K (Y, y exac t) + c 2 v] 



+ 

Denoting 
A^(5) = 




log a 



i 



i + 



A min (l- A 22 )(l 



21og(l + A*)+log(l- A 22 ) 




(1 + A*)(1-A 22 )V2 
we have the statement of Theorem [3J 



log(A min /r) 



1/2 



- 1(34) 



□ 



.2 Ky Fan distance inequalities 

Lemma 5. Assume that A — y and A ^ e _1 . Then the solution of 

exp{— z/A} = z 

satisfies 

z = -A\og(A)(l+w A ), 
where wa ^ and wa = o(l) as A — > 0. 

Proof. Proof of Lemma [5l Taking the logarithm of the given expression, we 
have 

— z/A = log z 

Since A — > 0, we must have z/logz — > which implies z — > 0. Denote 
/ = z/A, i.e. z = Af. Hence, the equation above can be rewritten as 

-/ = logA + log/ 

implying that / — > oo as A — > at the rate / = — logA(l + o(l)). Hence, 
the solution is z = — A log A(l + o(l)). 

To show that z ^ z* = —Alog(A), we note that for A ^ e _1 , 

exp{z,/A}z, = exp{-log(A)}(-Alog(A)) = -log(A) > 1 = exp{,2/A},2 

implying the desired inequality. 

□ 
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The following lemma follows obviously from the definition of Ky Fan 
distance. 

Lemma 6. If¥(d(X, Y) > e{) < e 2 for some ei, e 2 G (0, 1), then Pk(X, Y) < 
max(ei, e 2 )- 

.3 Proof of results in Section EH 

Proof of Lemma\^ Apply the Chernoff- Cramer bound to obtain that for all 
t and all x, e > 0, 

P(||r-/i|| >e) < e- £X Ee xllY ~^ ^ e -^ Ee ^||r- M ||i = e - £ * JJ Ee ^l^^l 

i 

Now, Ee*l y * - '**1 ^ Ee*^* - ' <t 5+Ee-*^-'**J. The cumulant function of a Poisson 
random variable Z with parameter A is logEe eZ = A[e e — 1]; hence, for 
Y t = o 2 tZ and A = /it/r, the cumulant function of Y t — fi t is 

c t (x) = logEe*^-^ = logEe XTZ - x/i t = ^[e XT - 1 - xt). 

T 

Hence, the cumulants of the rescaled Poisson distribution are = /^cr 2 ^ -1 ). 
Similarly, 

logEe-***-* 4 *) = ^[e~ XT - 1 + xt] < c t (x) Va; > 0. 
T 

Hence, denoting M = 2 ^ t /if, we have 

P(||y-/x|| >e) e - £X e 2 ^ Ct{x) = exp{-ex + M[e XT - 1 - xt]/t}. 

Since x > is arbitrary, we can take x corresponding to the minimum of the 
upper bound, which is achieved at x — r _1 log(l + e/M), implying 

P(||y - ,|| > e) < exp {-i±^ log (l + i) + £} « exp {-^ (l - ^) } . 

due to the inequality (1 + x) log(l + x) — x ^ — y(1 — f ) for small enough 
Z > 0. For e ^ 3M/2 we have 

P(||y-d|> e ) <e ^-^}. 
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Using Lemma El for r ^ l/(2eM), the solution of exp{— £ 2 /(4Mr)} = e 
satisfies 

e = y/-2rM log(2rM)(l + _), 
where u = oil) as o — )■ and w ^ 0. 

□ 

Proof of Proposition^ 1. Following the rescaled Poisson example, we have 
that the cumulant function for Y t is bounded by 

rn 2 _____ /-y* ^ 'Y> 2 ~1 / ^y,_\/tJ 

Q(x)=logEe = X/J, t + yIPjr + } j —K k < XfMj + -p-w^r + - } j C t w t 

i=3 i=3 

= x ^ t + ^ WtT + ^[e XT -l-XT-(xT) 2 /2} 

2 T 

C t w t . 
^ H [e x — 1 — xt\, 

T 

since C t ^ 1. Similarly, logEe xy * can be bounded in the same way. Hence, 
we have 

F(\\Y-n\\ >e)^ e- £X e 2 ^ Ct{x) = exp{-ex + —[e XT - 1 - xt]}. 

T 

where M = 2 C t w t . Now, this is the same upper bound as for the rescaled 
Poisson distribution. Hence, we have the same inequality for the Ky Fan 
distance. 

2. Apply the Markov inequality to the random variable \ \Y — n\\ K : 

Klir-Pll >«) < E||y ;/ r < E|l y '* < " T "' z T Lg . 

Hence, an upper bound on the Ky Fan distance satisfies nrLx/z K = z, i.e. 

z =[ nT rn(K)/2 LK }l/(K+l)_ 

□ 

.4 Proofs of the results in Section [5] 

Lemma 7. Denote 5b = ^[Cj 2 A T A + vC g <iT\~V, and assume that bi(u>) > 
for all i. 
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Let x G = {x G X : \\x — x*\\ < 5}, x* = 0. Then, for small enough S 
and v, we have the following bounds: 

hy(x)-h y {x k ) < (6(w) + 5 b ) T (x-x*), 
h y {x)-h y {x k ) > (b(ou) -5 b ) T (x-x*). 

Proof. Approximate h y (x) by a linear function using Taylor decomposition 
in a neighbourhood of x*: 

h y {x) = h y (x*) + [Vh y (x*)] T {x - x*) + A 00 (x). 

Similarly to the proof of Lemma HJ bound A o for w = x — x* G B(0,5) fl 
(Af — x*) using Taylor decomposition of h y (x): 3x c G (x,x*): 



I Aoo (5) I 



2 ^ ^ Vjjhy(x c )(xi x i )(xj Xj 



X ; 



ij k£ 

1 

2 ^ 



*\||2 



< ^(x-x*) T (C l2 A T A + vC g2 l)(x-x*) 

< ^-(x - x*) T {C l2 A T A + uC g2 l) 1 = 5f(x - x*), 

since Xi — x\ G [0, 5}. 

Thus, we obtain an upper bound 

h y (x) - h y (x*) < (b + 5 b ) T (x - x*) 

and the lower bound: 

h y (x) - h y (x*) > (b - 5 b ) T (x - x*). 



□ 



Proposition 3. Let assumptions on f y , g and 5 in Section \5J\ hold. 

Assume that x* = 0, bi = Vih y (x*) > for all i, and that 7 — y and 
v — >• as r ->■ 0. 
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Then, for any e G (0, 5), such that b min e/r — > oo, 



f P - h v( x )/ T (lr 
Jx\B(x*,e) C UX 



f x e~ h y( x )/ T dx 
and, in particular, 



$C p e ~ fe min £ /(v / P' r ) 



1 + Ai , A 



1 + A 1 + A c 



/ e -\hy{x)-h y {x*)]/r dx > T P TT b^ 1 [l + A S 

JB(x*,5) 



<B{x*,S) 

where Ai and A3 are defined by 



A3 fa 1/) 



-1 



1 + max Sniffy 



I _ e -mini& l <5/(v / P r ) 



J _ g- mini bi6/(^/pr) 



(35) 



(36) 



Proof of Proposition^ Making the change of variables v = (x — x*)/r with 
Jacobian J = r p , we have 



e -[hy(x)-h yc ^ t (x)]/r dx y T p eX P {-(6 + 5 b fv} dv 

B(x*,6)DX J B(0,S/r)n(X-x*) 



> r p / exp { -b T v } dv = t p TT br 1 TT 1 - exp { -b, 

J[o,s/(vpt)]p < > t y L L 

> ^n^ 7i [i+A 3 ] 

i 

with A 3 defined by (I3"6"|) . The error A 3 — > as r — > 0, since 5 — > and 
b m -^S/r ->■ 00, with P teact probability ->■ 1. 

Similarly, we obtain an upper bound on the following integral: 



f p-IM^O-M^*)]/" 1- ,^ < t-p 

J(Xr\B(x*,6))\B(x*,e) C UX - ' 



's/r<\\v\\<5/T, Vi>0 



exp {— 6 T t>} 



< 



< 



\\v\\>eh 



exp {— 6 T t>} (it) 



< 



t p / exp { — b T v } dv 

rp n^ 7i E ex p{-w(^)} 

pr p TT 6~ 1 exp { - min b^j ( ^/pr) } , 

i 
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where b = b — 5^. Assume that 5 is small enough so that hi > for all i. 
Therefore, 



f P - h v{ x )/ T rj r 

JB(x*,6)\B(x*,e) C aX 



< pe -6«nln E /(v^)( 1 + A x (5, 6)), 



where 

]_ _ e -max,6,;(5/(^/pr) 
Al(*,6) = -1 + IIrTT 11 r ■ T /fr ,- 

- L .- L hi + o 6) j p exp{- mini bie/{^/pT)} 

Hence, Ai is small if b m i n S/r — > oo as r — >• 0. 

Now we take into account the error of approximating the integral over X 
by the integral over B(x*,e): 

JX\B(x*,e) C UX _ J XnB(x* ,8)\B(x* ,e) C a,L ^ J X\B(x* ,8) C " X 

j x e ax JB(x\6)nx e ax + Jx\B(x*,s) e ax 

f p-h y {x)/T ( ] T a 

J^ng(aV)\g(xV) c ux A 

Thus, we have the required statement. 

□ 

Proof of Theorem [^} We proceed similarly as in the proof of Theorem [TJ 

By Strassen's theorem, for any x, pp(/^ pos t(w), S x ) = pk(£,x) where £ ~ 
//post(^)- Hence, we find an upper bound on the Ky Fan distance between £ 
and x*. 

Using Proposition [3j we have an upper bound e on the Ky Fan distance 
satisfies 



f p- h y( x )l T rl r 
JB(x*,S)\B(x*, £ ) ti ^ < 

Jb ( x*,s) e~ h ^dx 



A + p exp 



eb min } 1 + Ai 



^/prj 1 + A C 



where A = A /(l + A ). 

An upper bound on the Ky Fan distance is the smallest e > such that 

A < e, 
/ eb min \ 1 + Ai 

pexp ("^irTA; * £ - 
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The last inequality implies that as r/b h 
Lemma EJ we have that 



e < 



log 



log 



0, e — > 0. Hence, using 
1 + A{ ' 

1 + Ar 



Therefore, the Ky Fan distance is bounded from above by the maximum 
of the two expressions: 



PpOpostMA*) < max | - - — 



log 



max 



Ty/P 



1 + A ' b min (u] 



log 



y/P ^min 

T 



lOE 



1 + Ai 
1 + A 

l + A 4 (5,F(u;))) 



where A = A (-E>(0, 5)) is defined by (1151) and A 4 is defined by 

log ((1 + A0A1 + A )) 



A 4 (5,y) 



log (v^&mi»(w)/r) 



(37) 



□ 



Proof of Theorem O Now we prove Theorem [5j in the notation defined in 
the proof of Theorem HI 

We apply Theorem [2] with fij = {w : | -y cxact | | < Pk(T, Z/exact)} and 
O2 = \ ^1 with P(fia) < Pk(^, |/exact) by the definition of Ky Fan distance, 
with the bounds given in Theorem |5j which we modify to depend on y only 
via \\y — 2/exactH- For small enough r, given that 6* > 0, the assumption of 
the theorems that 6j > holds on for small enough r, as we shall show 
below. 

The upper bound depends on y via \\y — y exa ct||, b(u), A and Ai. 
We have that, on fli, 



bi = ^AjiVjyiAx^ + uVigix*) 
3 

> £ \Jv-~* ( AX *) - M /,lPK(^, yexact)] + "V i9 {x*) 

3 

= b* - p K (Y, y cxact )M f l A 3 
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and also 

hi - 6 b ,i >b*- [ PK {Y, y^)M fl + 8pC l2 \\A\\ 1}1 /2] £ A jti - v5 P C g2 /2, (38) 

3 

Note that if Ylj^j,i = 0' then 6, — 5 6i = z/[Vj^(x*) — 5pC g2 /2], i.e. the 
leading term in the lower bound is of order v. If . A^j 7^ 0, then the 

leading term in the lower bound is a positive constant J2j AjiV jf y (Ax*). 
Denote z* = arg mhij b* and assume that r and 5 are small enough so that 
the minimum of the lower bound in (|38p is also achieved at i*. Introduce An 
such that 

A n = [ PK (Y,y exact )M Ll + 0.58pC f ~ )2 \\A\\ 1A ] + 



mm mm 



If £. A^ = 0. 



Then, an upper bound on the Ky Fan distance is given by 
e < -^logf— ^1(1 + A 4 ) 



femin Vv^^ 

" ~b^[l^A n ] bg - An]) [1 + ^ ] ' 

since the function — a; log 2 increases for x < 1/e. This bound on e on Qi is 
independent of y. The error term A4 is given by 

log((l + Aj)/(l + A$)) 

4 log(v^^ n [l-An]/r) 

Using the lifting Theorem HI we have that, for small enough r, u, 

Pk(awA-) < max j 2p K (T, l/exact), Ap, --7^ log ( ) (1 + A5) I 

I °min \V^ min/ J 



where 



1 a; t iog(i - A„ ; 



log 

Thus, we have the statement of Theorem [5j 



□ 
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.5 Auxiliary results 

Define the following projections 

P v = V^V, 
P AV = (A T VA)^A T VA = A^P V A. 

Lemma 8. If [A T V y (x)A : B{x)\ is of full rank, 

Htf" 1 ^)!! = [mm(\ min ,p A V (A T V y (x)A + uB(x)), v\ minJ _ PAV (B(x)))}- 1 

1 

^ f 

min [\ mintPOS {A T V y {x)A) + v\ min)PAV (B(x)), v\ min j_p A V (B(x 

r l v , / T *x|| < \\PA,vVfy(x*)\\+v\\P A y\7g(x*)\\ 

' y{ ,l1 " \ mi nMA T V y (x*)A) + v\ min , PAAr (B(x*)) 

+ \ 1 K 1 1 1 i 1 ~ Pa,v) Vf y (x*) 1 1 + 1 1 (/ - P Ay ) Vg(x< 

where \ m m,p(B(x)) = miii|| t ,|| = i i _p w=w ||^(a;)t;|| is the smallest eigenvalue of 
B(x) on the range of P. 

Proof of Lemma\^ The norm of if -1 is given by 

| = [X^^VA + uB)}- 1 = [min \\(A T VA + pB)x\\]~ 1 

\\x\\ = l 

= [min \\{A t VA + vB)P a tx + vB){I - P^xWY 1 

\\x\\=l 

= [min( min ||(/^ + ^|| min u\ \B(I - P a t)x\ \)]~ 1 

= [min(A mini p 4T (A r Vv4 + uB), v\ m in,i-P AT (B))]' 1 . 
Weyl inequality implies that \ m i n ,P AT (A T V 'A + uB) ^ X m in, p aT ( A T VA) + 

L'Xmin, P A T 

Note that since we assumed that V^ exact (:r*) is of full rank, the pro- 
jection on the range of A T coincides with the projection on the range of 
A T V y _Jx*)A. 

Now we find an upper bound on \\H ycic!ict (x*) 1 Vh y (x*)\\ using the first 
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statement in Lemma 

(XVVV**)!! = ||^exac t (^)" 1 (V/, / (x*)+^(x*))|| 

< ll^™*(*T X l|p X Tll^(V/ s (rr*) + i/Vy(x*))|| 

+ 1 (a:*) -1 1 |/-p aT 1 1 (/ - ^t) [V/ y (s*) + uVg(x*)] \ \ 

+ 1 ^-Tm^l( / -PAT)[V/ 2/ (^) + ^(x*)]|| 



5^ 



[||P^V/ s (z*)|| + HMV V^x* 



Xmin,pos 

+ * 1 , m ^ 1 I ( J " ^t) V/,(x*) 1 1 + 1 1 (/ - Pa t ) Vg(x' 

^rain,I-P A T\^\ x )) 

□ 

Lemma 9. 1. ||(C + < (6 + Afc(C0)- 1 ||P o a?|| + - Pc)z|| 

where k = rank(C) and Afc(C) is the smallest positive eigenvalue of C, 
and Pq = C^C is the projection matrix. 

2. Cauchy's interlacing theorem (?): let C = C T be a n x n matrix, L 
any n — k dimensional linear subspace, and Cl = P^CP^. Then, for 
any j = l,...,n-k, 

XjiC) > X 3 (C L ) > X j+k (C). 

3- X m i npos (A T DA) ^ nrniD i>0 DiX minpos (A T A) where D is a diagonal ma- 
trix with non-negative entries. 

Proof of Lemmas 3. Xj(A T DA) = Xj(D 1/2 AA T D 1/2 ), and since j ^ rank(A T DA) = 
rank(D 1 / 2 AA T D 1 / 2 ), 

X j (D 1/2 AA T D 1/2 ) ^ min D i X j (P D AA T P D ) ^ min D i X i+m (AA T ) 

J Di>0 J Di>0 J 

by Cauchy's interlacing theorem, where m = rank(Po), n = dim(Z^). 

If j = r = rank(P^TPo), X r (A T DA) is the smallest positive eigenvalue 
of A T DA, and j + m = rank(Po) + rank (P^tPq) ^ rank(P^T). Hence 
X r+m (A T A) ^ ^rank(p T )(^ r ^)> an< ^ the latter is the smallest positive eigen- 
value of A T A. 

□ 



39 



